perm filename CNTOUR[0,BGB]4 blob sn#114000 filedate 1974-08-06 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00011 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	{⊂C<NαIMAGE CONTOURING.λ30P70I425,0JCFA} SECTION 7.
C00006 00003	
C00011 00004	⊂7.1	CRE - An Image Processing Sub-System.⊃
C00018 00005	
C00021 00006	⊂7.2	Thresholding.⊃
C00025 00007	⊂7.3	Contouring.⊃
C00030 00008	⊂7.4	Polygon Nesting.⊃
C00034 00009
C00037 00010	⊂7.5	Contour Segmentation.⊃
C00041 00011	⊂7.6	Related and Future Image Analysis.⊃
C00044 ENDMK
C⊗;
{⊂C;<N;αIMAGE CONTOURING.;λ30;P70;I425,0;JCFA} SECTION 7.
{JCFD}    VIDEO IMAGE CONTOURING.
{λ10;W250;JAFA}
	7.0	Introduction to Image Analysis.
	7.1	CRE - An Image Processing System.
	7.2	Thresholding.
	7.3	Contouring.
	7.4	Polygon Nesting.
	7.5	Contour Segmentation.
	7.6	Related and Future Image Analysis.
{λ30;W0;I950,0;JU;FA}
⊂7.0	Introduction to Image Analysis.⊃

	Simply put, image analysis is the inverse of image synthesis.
From this point  of view, the usually difficult question of "analysis
into what ?"  is answered by  the answer  to the question  "synthesis
from what ?". Since a 3-D geometric model is adequate (and necessary)
for  synthesizing digital  television pictures, it  is reasonsable to
suppose that a 3-D geometric  model is an appropriate subgoal  in the
analysis of  television pictures. Such  an analysis into  a 3-D model
would provide  a  useful  data  reduction as  well  as  a  convenient
representation for  solving robotics  problems such as  manipulation,
navigation  and  recognition.   This  approach to  image  analysis is
somewhat heretical,  the  orthodox method being to extract  freatures
from 2-D images which are used  directly to perform the desired task.
On the other hand,  vision by inverse computer graphics may be viewed
as an  extreme  form  of feature  finding,   involving  the  coherent
extraction of a set of basic geometric features which are combined to
form a  3-D model,   which  is a  super feature.   The  rest of  this
introduction enumerates some of the kinds of information available in
a  sequence of images  and some of  the kinds of  data structures for
representing images.  For a first definition,  an image is a 2-D data
structure representing the  contents of a rectangle  from the pattern
of  light formed  by a  thin lens;  a sequence  of images in  time is
called a film.

	Three basic kinds of information in  an image are photometric
information,   geometric  information,  and  topological information.
Fundamentally, geometry concerns distance  measure.  The geometry  of
an image  is based on coordinate  pairs that are associated  with the
elements  that form the  image.  From the  coordinates such geometric
properties as  length,   area,  angle  and moments  can be  computed.
Photometry  means light measure,   although physical  measurements of
light may include power,   hue,  saturation, polarization and  phase;
only the relative  power between points  of the same image  is easily
available  to a  computer using  a television  camera. The  taking of
color images is  possible at Stanford by  means of filters;  however,
the acquistion  of color is  inconvenient and has not  been seriously
pursued  in  the present  work.  Finally,   topology has  to  do with
neighborhoods,   what  is  next  to  what; topological  data  may  be
explicitly  represented by  pointers  between related  entities,   or
implicitly represented by the format of the data.

	Three basic kinds  of image data  structures are the  raster,
the contour map and  the mosaic. A raster image is  a two dimensional
integer  valued array  of pixels;  a pixel "picture  element",   is a
single sample position on  a vidicon.  Although  the real shape of  a
pixel is  probably that of a  blunt ellipse; the fiction  that pixels
tesselate the image into little rectangles will be adopted. For other
theoretical purposes the  array is assumed  to be formed by  sampling
and truncating a  two argument, smooth, infinitely differentable real
valued function. A contour image is like a geodesic contour map,   no
two contours ever cross  and all contours close.  A  mosaic image (or
tesselation)  is like  a  ceramic tile  mosaic, no  two  regions ever
overlap and the whole image is completely covered with tiles. Further
useful restrictions might be made  concerning whether it is permitted
to  have tiles with holes surrounding smaller  tiles or whether it is
permitted for a  tile to have points  that are thinner than  a single
pixel.

	Given a raster image,  the classical visual analysis approach
is to find the  features.  One canonical  geometric image feature  is
called the <edge> and the places where edges are not found are called
<regions>. For  a naive start,  an edge can  be defined as a locus of
change in the  image function.  Edges  and regions are  complementary
sides of the  same slippery concept; the concept  is slippery because
although a  continuous function of two variables and a graph of edges
are well  know mathematical objects  the conversion  of one into  the
other  is a poorly  understood process  that depends greatly  on ones
motives and resources.  A sophisticated definition of the region/edge
notion must include a procedure for converting a raster approximation
of  an  image function  into  a region/edge  representation  based on
parameters which are conceptually elegant.

⊂7.1	CRE - An Image Processing Sub-System.⊃

	The acronym CRE stands for "Contour,   Region,  Edge". CRE is
a solution  to the problem of finding contour  edges in a sequence of
television pictures and of  linking corresponding edges and  polygons
from  one picture  to the  next.   The  process is  automatic and  is
intended to run without human intervention. Furthermore,  the process
is bottom up; there are no inputs that anticipate  the content of the
given television images.  The output of CRE is a 2-D contour map data
structure which  is  suitable  input to  the  3-D  modeling  program,
GEOMED. Five design choices that determine the character of CRE are:
{λ10;}
	1. Dumb vision rather than model driven vision.
	2. Multi image analysis rather than single image analysis.
	3. Total image structure imposed on edge finding; rather
	   than separate edge finder and image analyzer.
	4. Automatic rather than interactive.
	5. Machine language rather than higher level language.
{λ30;}
\The design  choices are  ordered from  the more strategic  to
the   more  tactical;   the  first   three  choices   being  research
strategies, the latter two  choices  being  programming  tactics.
Adopting these  design choices lead  to image contouring  and contour
map structures similar to those of [Krakauer] and [Zahn].

	The first design  choice does not  refer to the issue  of how
model dependent a  finished general vision system will be (it will be
quite model dependent),   but rather to the  issue of how one  should
begin  building such  a system. The best  starting
points are  at the two apparent extremes of nearly total knowledge of
a particular  visual  world or  nearly total  ignorance.   The  first
extreme involves  synthesis (by computer graphics) of  a predicted 2-D
image, followed by comparing the predicted and a perceived image  for
slight differences  which are  expected but  not yet  measured.   The
second  extreme involves  anaylzing perceived images  into structures
which can  be readily  compared for  near equality  and measured  for
slight differences;  followed by the  construction of a  3-D geometric
model of  the perceived world. The point is that in both cases images
are compared,  and in both cases the 3-D  model initially (or finally)
contains specific  numerical data on the geometry  and physics of the
particular world being looked at.

	The second design choice, of multi image anaylsis rather than
single  image analysis,    provides a  basis for  solving  for camera
positions and feature  depths.   The third design  choice solves  (or
rather avoids)  the problem of  integrating an edge  finder's results
into  an image. By using a very  simple edge finder, and by accepting
all the edges found, the image structure is never  lost.  This design
postpones the  problem of interpreting photometric  edges as physical
edges. The fourth choice is a resolution to write an image  processor
that does not  requires operator assistance or parameter  tuning. The
final  design choice of  using machine language  was for  the sake of
implementing node link data structures that are processed  100 faster
than  LEAP, 10  times  faster  than compiled  LISP  and that  require
significantly  less memory than similar structures  in either LISP or
LEAP. Furthermore machine code assembles and loads faster than higher
level  languages;  and machine  code  can  be  extensively fixed  and
altered without recompiling.

	It is my  impression that CRE  itself does not raise  any new
scientific problems; nor  does it have any extremely new solutions to
the old problems; rather CRE  is another competent video region  edge
finding program with  its own set of tricks.   However, it is further
my  impression that the  particular tricks for  nesting and comparing
polygons in CRE are original programming techniques. As a part of the
larger  feedback  system,  CRE  is  a  necessary,  but  not  entirely
satisfactory implementation of pure bottom up image analysis.

	CRE  consists  of  five  steps:  thresholding,    contouring,
nesting,    smoothing and  comparing.   Thresholding,  contouring and
smoothing perform conversions between two different kinds  of images.
Nesting  and contouring  compute topological  relationships within  a
given  image  representation.  In summary  the  major  operations and
operands are:

{JV;λ10;}MAJOR OPERATION.	OPERAND.	  RESULT.
	1. THRESHOLDING: 	6-BIT-IMAGE,	  1-BIT-IMAGES.
	2. CONTOURING: 	 	1-BIT-IMAGES,	  VIC-IMAGE.
	3. NESTING:		VIC-IMAGE,	  NESTED-VIC-IMAGE.
	4. SMOOTHING:	 	VIC-IMAGE,	  ARC-IMAGE.
	5. COMPARING:		IMAGE & FILM,	  FILM.
{JU;λ30;}
	The initial  operand is a  6-bit video  raster, which in  the
present  implementation is coerced  into a window  of 216  row by 288
columns; intermediate operands  consist of  1-bit rasters named  PAC,
VSEG and HSEG which are explained below, as well as a raster of links
named  SKY which is  used to  perform the polygon  nesting. The final
result is a node/link  structure composed of several kinds  of nodes:
vectors, arcs, polygons, lamtens, levels, images and the film node.

	Although the  natural order of operations  is sequential from
image thresholding to image comparing;  in order to keep memory  size
down, the first four steps are applied one  intensity level at a time
from  the darkest cut  to the lightest  cut (only  nesting depends on
this sequential cut order); and comparing is applied to whole images.
Figure 7.1 illustrates  an initial video image  and its corresponding
contour  image.  The  contuored image consists  of thirteen intensity
levels and took 45 seconds to compute (on a  PDP-10,  2-usec memory).
The final CRE data structure was composed of 1996 nodes.

⊂7.2	Thresholding.⊃

	Thresholding, the first and easiest step of CRE,  consists of
two subroutines,   called THRESH and PACXOR.  THRESH converts a 6-bit
image into a 1-bit image with respect to a given threshold  cut level
between zero for black and sixty-three for light. All pixels equal to
or greater than the cut, map into a one; all the pixels less than the
cut, map  into zero. The  resulting 1-bit  image is  stored in a  bit
array  of 216 rows  by 288 columns  (1728 words,   36 bits  per word)
called the PAC  (picture accumulator)  which was named  in memory  of
McCormick's ILLIAC-III.   After THRESH,   the  PAC contains blobs  of
bits.   A blob is defined as "rook's  move" simply connected; that is
every bit of a  blob can be reached  by horizontal or vertical  moves
from any other  bit without having to  cross a zero bit  or having to
make a diagonal (bishop's) move.  Blobs may of course have holes.  Or
equalvalently a blob always has one outer perimeter polygon,  and may
have one, several or no  inner perimeter polygons. This blob and hole
topology is recoverible from the CRE  data structure and is built  by
the nesting step.

	Next,   PACXOR copies the  PAC into  two slightly larger  bit
arrays named HSEG and  VSEG. Then the PAC is shifted down one row and
exclusive OR'ed into the HSEG array; and the PAC is shifted right one
column  and  exclusive OR'ed  into  the  VSEG  array to  compute  the
horizontal and  vertical border bits of the PAC blobs.  Notice,  that
technically this  is the  very heart  of  the <edge  finder> of  CRE;
namely,   PACXOR is the  mechanism that converts  regions into edges;
raster into contours. Further notice,  that edge tracing is the  only
operation CRE performs  on the 1-bit rasters; although  Boolean image
processing  has caught  the eye of  many vision  programmers (perhaps
because it resembles an array automata or the game Life)  one rapidly
discovers that  raster operations alone  are too weak to  do anything
interesting  that  can't already  be  done better  analytically  in a
raster  of  n-bit  numbers  or  topologically  in  a  node/link  data
structure.
⊂7.3	Contouring.⊃

	Contouring,   converts  the  bit arrays  HSEG  and VSEG  into
vectors  and polygons.  The  contouring itself,  is  done by a single
subroutine named MKPGON,  make polygon.   When MKPGON is called,   it
looks for the upper most left non-zero  bit in the VSEG array. If the
VSEG  array is empty, MKPGON returns a  NIL. However, when the bit is
found, MKPGON traces and  erases the polygonal outline to  which that
bit belongs  and returns a polygon  node with a ring  of vectors. The
MKGON trace can go in four directions: north and south along vertical
columns of bits in the VSEG array, or  east and west along horizontal
rows  of the HSEG array.  The trace starts by  heading south until it
hits a turn; when heading south MKPGON must check for first a turn to
the east  (indicated by a  bit in HSEG);  next for no  turn (continue
south);  and last for a turn to the  west. When a turn is encountered
MKPGON creates a vector node representing the run of bits between the
previous turn  and the present turn.   The trace  always ends heading
west bound.  The outline so traced  can be either the edge of a  blob
or  a  hole, the  two  cases  are  distinguished by  looking  at  the
VIC-polygon's uppermost left pixel in the PAC bit array.

	There   are  two  complexities:   contrast  accumulation  and
dekinking.   The  contrast  of  a  vector  is  defined  as  (QUOTIENT
(DIFFERENCE (Sum  of pixel values on  one side of the  vector)(Sum of
pixel values  on the other side of the vector)) (length of the vector
in pixels)).  Since vectors are always either  horizontal or vertical
and  are  construed  as  being  on  the cracks  between  pixels;  the
specified summations refer to the  pixels immediately to either  side
of the vector. Notice  that this definition of constrast  will always
give a positive constrast for vectors of a blob and negative contrast
for the vectors of a hole.

	The  contours  that  have  just  been  traced   would  appear
"sawtoothed" or "kinky"; the terms "kink", "sawtooth" and "jaggy" are
used  to express  what  seems to  be wrong  about the  lowermost left
polygon in  figure 7.2.  The problem  involves doing  something to  a
rectilinear quantized set of segments,  to make its continuous nature
more evident. In CRE, the jaggies solution (in the subroutine MKPGON)
merely positions  the  turning locus  diagonally off  its grid  point
alittle  in  the  direction (northeast,    northwest,   southwest  or
southeast) that  bisects the  turn's right  angle.   The distance  of
dekink  vernier positioning  is always  less than  half a  pixel; but
greater  for brighter cuts and less for  the darker cuts; in order to
preserve the nesting of contours.   The saw toothed and  the dekinked
versions of  a polygon have  the same number  of vectors.   I am very
fond  of  this   dekinking  algorithm  because   of  its   incredible
efficiency; given that you have a north,  south,  east,  west polygon
trace  routine (which  handles image  coordinates packed  row, column
into one word); then dekinking requires only one more ADD instruction
execution per vector !

⊂7.4	Polygon Nesting.⊃

	The nesting problem is to decide  whether one contour polygon
is  within another.  Although  easy in the two  polygon case; solving
the nesting  of many  polygons  with respect  to each  other  becomes
n-squared expensive in  either compute time or in memory  space.  The
nesting  solution in  CRE sacrifices memory  for the  sake of greater
speed and requires a 31K array, called the SKY. CRE's accumulation of
a properly nested tree of  polygons depends on the order of threshold
cutting going from  dark to  light. For  each polygon  there are  two
nesting steps:  first, the polygon  is placed in  the tree  of nested
polygons by the  subroutine INTREE; second,  the polygon is placed in
the SKY array by the subroutine named INSKY.

	The SKY array is 216 rows of 289 columns of  18-bit pointers.
The name  "SKY" came about because,   the array use  to represent the
furthest  away regions or  background, which  in the case  of a robot
vehicle is the real sky  blue. The sky contains vector  pointers; and
would  be more  efficient  on a  virtual memory  machine  that didn't
allocate unused pages of its  address space.  Whereas most  computers
have more memory containers than address space; computer graphics and
vision might be easier to program in a memory with more address space
than physical space; i.e. an almost empty virtual memory.

	The first part of the INTREE routine finds the  surrounder of
a given polygon by scanning the  SKY due east from the uppermost left
pixel  of the  given polygon.   The  SON of  a polygon is  always its
uppermost  left vector.  After  INTREE,   the  INSKY  routine  places
pointers to  the vertical vectors of  the given polygon  into the sky
array. The second part of the INTREE routine checks for and fixes  up
the case  where the new  polygon captures a  polygon that  is already
enclaved. This only happens when two or more levels of the image have
blobs that  have  holes.   The  next  paragraph explains  the  arcane
details of fixing up the tree links of multi level hole polygons; and
may  be skipped by everyone  but those who might  wish to implement a
polygon nester.

	Let the given polygon  be named Poly; and let  the surrounder
of Poly be  called Exopoly; and assume that Exopoly surrounds several
enclaved polygons called  "endo's", which are  already in the  nested
polygon tree. Also, there are two  kinds of temporary lists named the
PLIST and  the NLIST. There is one PLIST which is initially a list of
all the ENDO polygons on  Exopoly's ENDO ring. Each endo in  turn has
an NLIST  which is initially  empty.  The  subroutine INTREE re-scans
the sky array for the polygon  due east of the uppermost left  vector
of each  endo polygon on  the PLIST, (Exopoly's  ENDO ring).  On such
re-scanning, (on behalf of say an Endo1), there are four cases:
{λ10;W100,1160;}
\1. No change; the scan returns Exopoly;
which is Endo1's original EXO.

\2. Poly captures Endo1;
the scan returns Poly indicating that endo1 has been captured by Poly.

\3. My brothers fate; the scan hits an endo2 which
is not on the PLIST; which means that endo2's EXO is valid
and is the valid EXO of endo1.

\4. My fate delayed; the scan hits an endo2 which is still
on the PLIST; which means that endo2's EXO is not yet
valid but when  discovered it will also be Endo1's EXO;
so Endo1 is CONS'ed into Endo2's NLIST.
{λ30;W0,1260;}
\When an endo  polygon's  EXO has  been  re-discovered, then  all  the
polygons on  that endo's NLIST are also  placed into the polygon tree
at that place. All of this link crunching machinery takes half a page
of code and is not frequently executed.

⊂7.5	Contour Segmentation.⊃

	In  CRE  the  term <segmenting>  refers  to  the  problem  of
breaking a 1-D manifold (a polygon) into simple functions (arcs). The
segmenting step,  converts the  polygons of  vertical and  horizontal
vectors into polygons of arcs.   For the present the term "arc" means
"linear  arc" which  is a  line segment.  Fancier arcs:  circular and
cubic spline were implemented and thrown out mostly because they were
of no use to higher processes such as the polygon compare which would
have to  break the  fancy  arcs back  down  into linear  vectors  for
computing areas, inertia tensors or mere display buffers.

	Segmenting is applied to each polygon of  a level.  To start,
a ring of two arcs is formed (a bi-gon) with one arc at the uppermost
left and  the  other  at the  lowermost  right of  the  given  vector
polygon. Next a recursive make arc  operation, MKARC, is is appled to
the two initial arcs. Since the arc given to MKARC is in a one to one
correspondece with a doubly  linked list of vectors; MKARC  checks to
see whether each point on  the list of vectors is close enough to the
approximating arc.  MKARC returns the  given arc as good enough  when
all the sub vectors fall within a given width; otherwise MKARC splits
the arc in two  and places a new arc vertex on the vector vertex that
was furthest away from the original arc.

	The two large  images in  figure 7.3,   illustrate a  polygon
smoothed with  arc width  tolerances set at  two different  widths in
order  to show  one  recursion of  MKARC.   The eight  smaller images
illustrate the  results of  setting the  arc width  tolerance over  a
range of  values. Because of the dekinking  mentioned earlier the arc
width tolerance can  be equal  to or less  than one  pixel and  still
yield a substantial  reduction in the  number of vectors it  takes to
describe a contour polygon.

	A final  important detail is that the  arc width tolerance is
actually taken as  a function  of the highest  contrast vector  found
along the  arc; so  that high  contrast arcs  are smoothed  with much
smaller  arc  width  tolerances than  are  low  contrast  arcs. After
smoothing, the contrast across each  arc is computed and the  ring of
arcs  replaces the ring  of vectors  of the given  polygon. (Polygons
that would be expressed as only two arcs are deleted).{Q}

⊂7.6	Related and Future Image Analysis.⊃

	Image analysis  simply consists of  three steps:  first, high
quality  pictures are taken  continuously in time  and space; second,
several low level bulk  operations are applied  to each image and  to
pairs of  images such as  correlation, filtering,   histogramming and
thresholding; third,  the rasters are converted in linked to form 2-D
structures  which  can be  futher  linked  to  form  a  coherent  3-D
geometric model.  In its clear  to me that  my present implementation
has only afew of  the necessary parts; more  kinds of image  features
and larger  coherent structures have  to be included.  In particular,
the  contour  maps  should  be  bundled  into regional  mosaics,  moe
features  should be  woven  into  the  node/link  structure  such  as
parallax edges and high variance windows.
	
	Contour image processing is effectively surveyed by [Freeman]
who   gives  the  impression   that  contour  images   are  the  only
line-drawing representation. Contours  are applied to recognition  of
silhouettes by  [Dudani] using moments similar to  those explained in
the next chapter. Finally, my own acquaintance with the contour image
representation  was  initially  derived from  papers  by  [Zahn]  and
[Krakaurer].